Goto

Collaborating Authors

 european city


Stochastic Streets: A Walk Through Random LLM Address Generation in four European Cities

Fu, Tairan, Campo-Nazareno, David, Coronado-Blázquez, Javier, Conde, Javier, Reviriego, Pedro, Lombardi, Fabrizio

arXiv.org Artificial Intelligence

Northeastern University, Boston, US A Abstract: Large Language Models (LLMs) are capable of solving complex math problems or answer difficult questions on almost any topic, but can they generate random street addresses for European cities? Large Language Models (LLMs) have shown impressive performance across a wide range of task s, such as answering questions on virtually any topic. However, there remain areas in wh ich their performance falls short, for example, seemingly simple tasks like counting the letters in a word. In this column, we explore another such challenge: generatin g random street addresses for four major European cities. Our results reveal that LLMs exhibit strong biases, repeatedly selecting a limited set of streets and, for some models, even specific street numbers. Surprisingly, so me of the more prominent and ico nic streets are not selected by the models and the most frequent numbers in the responses lack any clear significance.


Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1

Valmeekam, Karthik, Stechly, Kaya, Gundawar, Atharva, Kambhampati, Subbarao

arXiv.org Artificial Intelligence

The ability to plan a course of action that achieves a desired state of affairs has long been considered a core competence of intelligent agents and has been an integral part of AI research since its inception. With the advent of large language models (LLMs), there has been considerable interest in the question of whether or not they possess such planning abilities, but -- despite the slew of new private and open source LLMs since GPT3 -- progress has remained slow. OpenAI claims that their recent o1 (Strawberry) model has been specifically constructed and trained to escape the normal limitations of autoregressive LLMs -- making it a new kind of model: a Large Reasoning Model (LRM). In this paper, we evaluate the planning capabilities of two LRMs (o1-preview and o1-mini) on both planning and scheduling benchmarks. We see that while o1 does seem to offer significant improvements over autoregressive LLMs, this comes at a steep inference cost, while still failing to provide any guarantees over what it generates. We also show that combining o1 models with external verifiers -- in a so-called LRM-Modulo system -- guarantees the correctness of the combined system's output while further improving performance.


NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

Zheng, Huaixiu Steven, Mishra, Swaroop, Zhang, Hugh, Chen, Xinyun, Chen, Minmin, Nova, Azade, Hou, Le, Cheng, Heng-Tze, Le, Quoc V., Chi, Ed H., Zhou, Denny

arXiv.org Artificial Intelligence

We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for a tool-use environment for evaluating LLMs on Planning. We observe that NATURAL PLAN is a challenging benchmark for state of the art models. For example, in Trip Planning, GPT-4 and Gemini 1.5 Pro could only achieve 31.1% and 34.8% solve rate respectively. We find that model performance drops drastically as the complexity of the problem increases: all models perform below 5% when there are 10 cities, highlighting a significant gap in planning in natural language for SoTA LLMs. We also conduct extensive ablation studies on NATURAL PLAN to further shed light on the (in)effectiveness of approaches such as self-correction, few-shot generalization, and in-context planning with long-contexts on improving LLM planning.


Six European cities tap AI to cut carbon emissions

#artificialintelligence

Helsinki, Amsterdam, Copenhagen, Paris Region, Stavanger and Tallinn will challenge companies to develop energy and mobility solutions using artificial intelligence (AI) as well as 5G, Internet of Things (IoT) and other related technologies. The initiative is part of AI4Cities, a three-year EU-funded project bringing together European cities looking for AI solutions to reduce their greenhouse gas emissions and meet climate commitments. The cities and regions will go through a pre-commercial procurement (PCP) process, which allows them to steer the development of new solutions directly towards their needs. Once they have defined their requirements, the cities will challenge start-ups, SMEs and larger companies to design solutions applying the use of AI and other technologies. Total funding of €4.6 million will be divided among the selected suppliers throughout the whole PCP process.


Cardiac arrest-detecting AI will expand to further European cities

#artificialintelligence

A startup which is able to detect cardiac arrests using artificial intelligence has announced it will be expanding its service to further European cities later this year. Corti has partnered with the European Emergency Number Association (EENA) to expand its service to four additional cities. The startup, based in Copenhagen, will select which cities it will expand its pilot to next in June. The pilot in Copenhagen has been a resounding success measured by the analysis released today. Based on more than 2,000 cardiac arrest emergency calls in 2014, Corti was 93 percent accurate.


Goodness of Fit in MDS and t-SNE with Shepard Diagrams

@machinelearnbot

The goodness of fit for data reduction techniques such as MDS and t-SNE can be easily assessed with Shepard diagrams. A Shepard diagram compares how far apart your data points are before and after you transform them (ie: goodness-of-fit) as a scatter plot. Shepard diagrams can be used for data reduction techniques like principal components analysis (PCA), multidimensional scaling (MDS), or t-SNE. In this post, I illustrate goodness of fit with Shapard diagrams using a simple example which maps the locations of cities in Europe using t-SNE and MDS. You will see that the t-SNE approach, which is not designed to preserve all distances in the data, produces an odd-looking map of Europe and a distorted Shepard diagram.


Software detects stylistic features - The Tartan

AITopics Original Links

Carl Doersch, a doctoral student studying machine learning, and his colleagues have developed new graphical software capable of identifying stylistic features of cities. The team, composed of researchers from both Carnegie Mellon and the Institut National de Recherche en Informatique et en Automatique, has published its work in the computer graphics journal ACM Transactions on Graphics. The process of finding related patterns between images is known as visual data mining. Project collaborator Alexei Efros, an associate professor of robotics and computer science at Carnegie Mellon, pointed out that the science is still relatively new. "The field of visual data mining is still in its infancy, but I believe it holds a lot of promise," he said in a university press release.


Self-driving delivery robots could soon be common sights in European cities

#artificialintelligence

Airborne drone delivery is still more PR than public reality, but wheeled, self-driving delivery bots could be trundling down a sidewalk near you sooner than you think. London-based Starship Technologies, which counts Skype co-founders Ahti Heinla and Janus Friis among its founding team, is launching a broad testing phase of its autonomous delivery bots in parts of the UK, Germany and Switzerland starting this month. Starship's relatively small wheeled delivery bots have been in testing in select cities in 12 countries during the last nine months already, but this expansion of the trial will mark the first time the robots are being tested in actual delivery scenarios. That means they're bringing on partners to provide the delivery inventory, including food delivery players Just Eat, and London-based Pronto.co.uk. German retailer Metro Group and parcel delivery company Hermes will also take part in the pilot, which will span five cities providing deliveries to actual paying customers.